23 research outputs found

    Evaluation of the Importance of Time-Frequency Contributions to Speech Intelligibility in Noise

    Get PDF
    Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures

    A new mask-based objective measure for predicting the intelligibility of binary masked speech

    Get PDF
    ABSTRACT Mask-based objective speech-intelligibility measures have been successfully proposed for evaluating the performance of binary masking algorithms. These objective measures were computed directly by comparing the estimated binary mask against the ground truth ideal binary mask (IdBM). Most of these objective measures, however, assign equal weight to all time-frequency (T-F) units. In this study, we propose to improve the existing mask-based objective measures by weighting each T-F unit according to its target or masker loudness. The proposed objective measure shows significantly better performance than two other existing mask-based objective measures

    Preference for 20-40 ms window duration in speech analysis

    No full text
    In speech processing the short-time magnitude spectrum is believed to contain most of the information about speech intelligibility and it is normally computed using the short-time Fourier transform over 20-40 ms window duration. In this paper, we investigate the effect of the analysis window duration on speech intelligibility in a systematic way. For this purpose, both subjective and objective experiments are conducted. The subjective experiment is in a form of a consonant recognition task by human listeners, whereas the objective experiment is in a form of an automatic speech recognition (ASR) task. In our experiments various analysis window durations are investigated. For the subjective experiment we construct speech stimuli based purely on the short-time magnitude information. The results of the subjective experiment show that the analysis window duration of 15–35 ms is the optimum choice when speech is reconstructed from the short-time magnitude spectrum. Similar conclusions were made based on the results of the objective (ASR) experiment. The ASR results were found to have statistically significant correlation with the subjective intelligibility results. Index Terms — Analysis window duration, magnitude spectrum, automatic speech recognition, speech intelligibility 1

    The Effect of the Additivity Assumption on Time and Frequency Domain Wiener Filtering for Speech Enhancement

    No full text
    In this paper, we investigate the validity of the common assumption made in Wiener filtering that the clean speech and noise signals are uncorrelated under short-time analysis typically used for speech enhancement. In order to achieve this we have performed speech enhancement experiments, where speech corrupted by additive white Gaussian noise is enhanced by a Wiener filter designed in the time as well as the frequency domains. Results of oracle-style experiments confirm that the inclusion of the additivity assumption in Wiener filtering results in negligible degradation of enhanced speech quality. Informal listening tests show that the background noise resulting from time domain enhancement to be more tolerable than the background noise resulting from frequency domain framework. Index Terms: Wiener filtering, speech enhancement 1
    corecore